61 research outputs found

    A calculation method to estimate thermal conductivity of high entropy ceramic for thermal barrier coatings

    Full text link
    High entropy ceramics are highly promising as next generation thermal barrier coatings due to their unique disorder structure, which imparts ultra-low thermal conductivity and good high temperature stability. Unlike traditional ceramic materials, the thermal resistance in high entropy ceramics predominantly arises from phonon-disorder scattering rather than phonon-phonon interactions. In this study, we propose a calculation method based on the supercell phonon unfolding (SPU) technique to predict the thermal conductivity of high entropy ceramics, specially focusing on rocksalt oxides structures. Our prediction method relies on using the reciprocal value of SPU phonon spectra linewidth as an indicator of phonon lifetime. The obtained results demonstrate a strong agreement between the predicted thermal conductivities and the experimental measurements, validating the feasibility of our calculation method. Furthermore, we extensively investigate and discuss the atomic relaxation and lattice distortion effects in 5-dopants and 6-dopants rocksalt structures during the process.Comment: 19 page, 8 figure

    Forcing you to experience wonder: Unconsciously biasing people’s choice through strategic physical positioning

    Get PDF
    Magicians have developed powerful tools to covertly force a spectator to choose a specific card. We investigate the physical location force, in which four cards (from left to right: 1-2-3-4) are placed face-down on the table in a line, after which participants are asked to push out one card. The force is thought to rely on a behavioural bias in that people are more likely to choose the third card from their left. Participants felt that their choice was extremely free, yet 60% selected the 3rd card. There was no significant difference in estimates and feelings of freedom between those who chose the target card (i.e. 3rd card) and those who selected a different card, and they underestimated the actual proportion of people who selected the target card. These results illustrate that participants’ behaviour was heavily biased towards choosing the third card, but were oblivious to this bias

    Improved speaker independent lip reading using speaker adaptive training and deep neural networks

    Get PDF
    Recent improvements in tracking and feature extraction mean that speaker-dependent lip-reading of continuous speech using a medium size vocabulary (around 1000 words) is realistic. However, the recognition of previously unseen speakers has been found to be a very challenging task, because of the large variation in lip-shapes across speakers and the lack of large, tracked databases of visual features, which are very expensive to produce. By adapting a technique that is established in speech recognition but has not previously been used in lip-reading, we show that error-rates for speaker-independent lip-reading can be very significantly reduced. Furthermore, we show that error-rates can be even further reduced by the additional use of Deep Neural Networks (DNN). We also find that there is no need to map phonemes to visemes for context-dependent visual speech transcription

    Finding phonemes: improving machine lip-reading

    Get PDF
    In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated phonemes, which have been confused during phoneme recognition, into viseme groups. Using these maps, with the LiLIR dataset, we show the effect of changing the viseme map size in speaker-dependent machine lip-reading, measured by word recognition correctness and so demonstrate that word recognition with phoneme classifiers is not just possible, but often better than word recognition with viseme classifiers. Furthermore, there are intermediate units between visemes and phonemes which are better still

    Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

    Full text link
    Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

    Resolution limits on visual speech recognition

    Get PDF
    Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest

    Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

    Get PDF
    A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers

    ControlCom: Controllable Image Composition using Diffusion Model

    Full text link
    Image composition targets at synthesizing a realistic composite image from a pair of foreground and background images. Recently, generative composition methods are built on large pretrained diffusion models to generate composite images, considering their great potential in image generation. However, they suffer from lack of controllability on foreground attributes and poor preservation of foreground identity. To address these challenges, we propose a controllable image composition method that unifies four tasks in one diffusion model: image blending, image harmonization, view synthesis, and generative composition. Meanwhile, we design a self-supervised training framework coupled with a tailored pipeline of training data preparation. Moreover, we propose a local enhancement module to enhance the foreground details in the diffusion model, improving the foreground fidelity of composite images. The proposed method is evaluated on both public benchmark and real-world data, which demonstrates that our method can generate more faithful and controllable composite images than existing approaches. The code and model will be available at https://github.com/bcmi/ControlCom-Image-Composition
    • …
    corecore